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Abstract 

Solutions to a linear Diophantine system, or lattice points in a rational convex 
polytope, are important concepts in algebraic combinatorics and computational 
geometry. The enumeration problem is fundamental and has been well studied, 
because it has many applications in various fields of mathematics. In algebraic 
combinatorics, MacMahon partition analysis has become a general approach for 
linear Diophantine system related problems. Many algorithms have been devel- 
oped, but "bottle neck" problems always arise when dealing with complex prob- 
lems. While in computational geometry, Barvinok's important result asserts the 
existence of a polynomial algorithm when the dimension is fixed. However, the 
implementation by the LattE package of Loera et. al. does not perform well 
in many situations. By combining excellent ideas in the two fields, we general- 
ize Barvinok's result by giving a polynomial algorithm for MacMahon partition 
analysis in a suitable condition. We also present a Euclid style algorithm, which 
might not be polynomial but is easy to implement and performs well. As appli- 
cations, we contribute the generating series for magic squares of order 6. 

Mathematics Subject Classification. Primary 05-04, secondary 05A15, 52B99. 
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1 Introduction 

Linear Diophantine system is one of the most fundamental concepts in mathematics. 
One basic problem is to determine the set of non-negative integer solutions of a system 
of linear equations (or inequalities) Ax = b for suitable integral matrix A and vector 
b. In the context of geometry, the problem is to determine lattice points in a rational 
convex polytope P = {a : Aa = b, a > 0} specified by A and b. That is, we need 
to determine the set P D Z n . If b = then the linear Diophantine system is called 
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homogeneous, and the corresponding P is called a rational cone. This basic problem 
received much attention for its wide application in many fields of mathematics. Many 
theories have been developed but this paper will address on the algorithmic aspect. 
There are many algorithms dealing with linear Diophantine system related problems. 
Two algorithms in two different fields have great advantages. One is Barvinok's poly- 
nomial algorithm in computational geometry [6] and the other is the author's partial 
fraction algorithm [18] in algebraic combinatorics. 

We will use the short hand notation X — X i X<2 ' ' ' X °™ throughout this paper. 
To understand the complexity of this problem, let us specify that A is a r x n matrix 
of rank r and consider the homogeneous case. The structure of the Z-solutions {a G 
Z n : Aa = 0} is simple, since it forms a subgroup of II 1 with n — r generators, which 
can be obtained through Hermite normal form. The structure of nonnegative integer 
solutions £ = {a 6 N™ : Aa = 0} is only a free commutative monoid (semigroup with 
identity). There is no simple way to enumerate elements of E, which is equivalent to 
the construction of the rational generating series 

*(*) = E (X ; A,0) = |>« = 

where ranges over all extreme rays of E and P(x) might be a monster polynomial. 
See [TTf Ch. 4.6] and combinatorial theories developed there. Many practical problems 
only need some specializations of E(x), such as E(q, q, . . . , q). It is worth mentioning 
that there is a beautiful reciprocity theorem for rational cones due to Stanley, which 
gives simple connection between nonnegative solutions and positive solutions. 

Our algorithms are under the framework of algebraic combinatorics, but borrow 
some beautiful ideas from computational geometry. The heart problem is to compute 
the constant term in A = (Ai, . . . , X r ) of an Elliott-rational function, written as 

pry (]) 

a (1-M 1 )(l-M 2 )---(1-M n y { ) 

where L is a Laurent polynomial and Mi are monomials for all i. Following [18], here 
we specify a field of iterated Laurent series, called working field, to clarify the series 
expansion of rational functions. It was George Andrews who found that MacMahon's 
partition analysis could be ideally combined with computer algebra for dealing with 
linear Diophantine system related problems [3]. Andrews and his coauthors has pub- 
lished a series of 12 papers in this topic. The first such algorithm was developed by 
Andrews et al. implemented by the Mathematica package Omega [I] and an improve- 
ment appears in [5]. A big progress is the author's partial fraction algorithm [T8] . 
where the field of iterated Laurent series was introduced and several bottle neck prob- 
lems were solved. By using iterated Laurent series, factors like (1 — Ai/A2) _1 have 
series expansions and are allowed to appear in the computation. The main body of 
Xin's algorithm uses partial fraction decomposition and read off their constant terms 
separately. The algorithm is simple and implemented by the updated Maple package 
E112. In computing partial fraction decompositions, bottle necks resist for multiple 
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roots and nonlinear factors in the denominator. We find that these bottle necks can 
be solved using ideas from computational geometry. 

A simpler model has been extensively studied earlier in computational geometry. 
Let P be as above specified by integral matrix A = {a,ij) rxn and nonzero vector b. 
Then it is well-known that the generating polynomial for P D TP 1 can be written as a 
constant term: 



So this is just the special case of ([I]) when the numerator L is a monomial. Geometers 
are more interested with the specialization of Xi — 1 for all i, which gives the number 
of lattice points in P. The most important result in this field is due to Barvinok, who 
developed a polynomial algorithm when the dimension is fixed [6] in 1994. Barvinok's 
algorithm is hard and was implemented by the LattE package by Leora et al. [13] in 
2004. An improvement was given in [UJ. The readers are referred to [13] for related 
references and [7] for related applications. Xin's algorithm is not polynomial, but the 
E112 package has better performance than the LattE package in some situations. The 
two algorithms are very different in nature. See Section 2. We find that we can do 
better if combine beautiful ideas of the two algorithms. 

The paper is organized as follows. Section 1 is this introduction. Our ultimate goal 
is to develop a classic algorithm for this subject in the near future. Section 2 introduces 
and compares the ideas of Barvinok's polynomial algorithm and Xin's partial fraction 
algorithm. Section 3 includes one of the two major contributions in this paper. We 
extends Barvinok's algorithm for the multivariate case, which give rise to a polynomial 
algorithm for MacMahon partition analysis. We do not give implementation of this 
algorithm since it is not a easy task and there are much room for improvements. Section 
4 includes the other major contribution. We develop a Euclid style algorithm with 
an easy implementation by the Maple package CTEuclid. This algorithm performs 
well and solves several bottle-neck problems in the E112 package. In Section 5, we 
give an introduction for CTEuclid with concrete examples for the sake of clarity. We 
explain the flexibility of our algorithm and our strategy for benchmark problems. As 
an application, we give the first solution for the generating function of order 6 magic 
squares. 

2 Comparison of Barvinok's polynomial algorithm 
and Xin's partial fraction algorithm 

Barvinok's algorithm is in the view of computational geometry and Xin's algorithm 
is along the line of MacMahon partition analysis in algebraic combinatorics. In this 
section we compare the two algorithms and conclude that a better strategy is to combine 
the nice ideas of the two algorithms. It is better to give a brief description of the two 
algorithms. We follow notations in the introduction. 
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Barvinok's algorithm computes the generating polynomial E(x; A, b) as a sum of 
rational functions and then evaluate at Xi — 1 for all i. The main body of the algorithm 
can be summarized in the following theorem. 

Theorem 1 (Barvinok). There is a polynomial time algorithm to write E(x;A, 0) as 
a sum of N simple rational functions, where N is a polynomial in the bytes of A. 

The theorem can be shown by three facts. 1) Rational cones are signed-decomposed 
into simplicial cones in polynomial time. 2) Barvinok made a key observation that a 
simplicial cone can be decomposed in polynomial time to unimodular simplicial cones. 
3) The generating function for a unimodular simplicial cone is simple. 

Algorithm Barvinok for computing E(x; A, b)\ _ : 

1. By using Brion's theorem, we can write 

E(x;A,b) = ^V (i) £(z; A«,0), 

i 

where the summands corresponds to vertex cones and the sum ranges over all 
vertices of P. 

2. Apply Theorem [1] for each E(x\ A^\ 0). 

3. Finally, take limits when Xi — > 1 for all i, as we shall discuss in Section 3. 
The readers are referred to [6] and [13] for detailed explanation. 

Xin's algorithm computes the constant term of an Elliott rational function in ([1]) 
regarded as an iterated Laurent series. The main body of the algorithm is to take 
constant term in a single variable. 

Algorithm XinPF 

1. For each Elliott-rational functions as a summand, successively choose a variable 
and take the constant term as in the next step. Note that after specifying a 
working field, we are taking constant term in a set of variables. 

2. When taking constant term in A of an Elliott-rational function, computing the 
partial fraction decomposition and read off the constant term in A. 

The comparison 

From the above descriptions, we see that the two algorithms are very different. 

1. The basic elements of Barvinok's algorithm is rational cones while that of Xin's 
algorithm is Elliott rational functions, which is a larger class of objects. 

2. By performing on rational cones, Barvinok's algorithm avoids the convergence 
problem. Xin's algorithm settling the convergence problem by introducing the 
field of iterated Laurent series as a framework. 
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3. Barvinok's algorithm is polynomial while Xin's is not. 

4. Xin's algorithm deals with the numerator uniformly while Barvinok' does not. 

5. The application of partial fraction decompositions has much more flexibility than 
rational cone decompositions under our framework. 

Although Barvinok's algorithm is polynomial, the application of Brion's theorem 
maybe very costly if the number of vertices is large. Indeed, Xin's E112 package 
has better performance than the LattE package in many situations, such as the Sdd5 
problem. Because of the freedom of Xin's algorithm, we conclude that a better strategy 
is to embed some of Barvinok's ideal to Xin's frame work. This leads to a polynomial 
algorithm for MacMahon partition analysis in Section 3, and a Euclid style algorithm 
in Section 4, which solves two bottle neck problems in the step of partial fraction 
decomposition. 



3 A polynomial algorithm for MacMahon partition 
analysis in theory 

Let us describe the heart problem of this paper more clearly as follows. Given an 
Elliott rational function E = E(xi,X2, ■ ■ ■ ,x m ) in the form 

T 

E 



{l-M x ){l-M 2 )---{l-M n Y 

where L is a Laurent polynomial and Mj are monomials. Let {Ai, . . . , A r } be a subset 
of {xi, . . . , x m }. We need to compute the constant term of E 

CT E — a simple representation free of the A's, 

Ai , . . . , A r 

when E is regarded as an iterated Laurent series. 



3.1 The big picture 

For simplicity we assume E has the natural series expansion in this section. If use 
terms in the next section, E is called in its proper form. To clarify the situation, write 
Mj = Mj(x)A c i where A a = A^A^ 2 • • • X? and Mj(x) is free of A. Let A = (ci, . . . , c n ) 
be the matrix with the jth column Cj. Assume the rank of A is r so that the solution 
space of Aa = has dimension d = n — r. Polynomial time means that the running 
time is polynomial in the bytes 0(nr log (max A)) of A, and the bytes of L, when the 
dimension d is fixed. Since the bytes of L is linear in the number of terms of L, we 
may assume L is a monomial. 

The big picture of the algorithm is as follows. We are indeed extending Barvinok's 
algorithm to the multivariate case. 
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Add slack variables z\ , . . . , z n so that 



E' 



L 




E. 



li; ,(1 -■ : ; ,U ; (,-).V ) 



=1 



This step is easy but necessary in the general situation. 

2. Eliminate \ for all % to get a big sum of polynomial size. For instance, assume 
L is a monomial and write Zj = zjMj(x). Then CT^E' corresponds to the 
generating function for lattice points in a rational convex polytope. Applying 
Brion's theorem and then applying Theorem [T] will give a sum of N' terms where 
N' is polynomial in the bytes of A. 

3. Eliminate all the slack variables Z\. This can also be done in polynomial time, as 
we shall elaborate next. 

Note that the proposed second step is in the spirit of Barvinok's algorithm. It has 
much more room for improvements. 

3.2 Eliminating the slack variables 

Algorithm for eliminating the slack variables could be think as an independent subject, 
so let us clarify the problem. Our input is a rational function Q(xi, . . . , x m ; z±, . . . , z n ) 
written as a (big) sum of simple rational functions: 



Our output will be a representation of Q(x±, . . . , x m \ 1, . . . , 1), which is known in 
priori to be well defined. The z's are called the slack variables. In particular the output 
is a number for m = and a single rational function for m = 1. We do not hope a 
single rational function for m > 2. 

It is not clear how to eliminate even when m = 0: i) combining the terms to 
a single rational function will get a monster numerator that can not be handled by 
the computer; ii) direct substitutions of Zi — 1 for all i does not work for possible 
denominator factors like 1 — z\Zi in some of the terms. 

The algorithm we present here is inspired by the work from computational geometry: 
the m = case was given a nice solution as a basic step in Barvinok's algorithm for 
lattice point counting. Though the idea works for general rational functions, we shall 
concentrate on Elliott-rational functions, that is 



where are monomials in x, Lj are Laurent polynomials. Such rational functions 
arose from MacMahon partition analysis. In particular, the m = 1 case corresponds to 
the computation of Ehrhart series. 




simple numerator 




(1 - M a z B ")(l - M i2 z B ») •••(!- M id z B ^) ' 
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The algorithm consists of two steps: First reduce the number of slack variables to 
1, and then use Laurent series expansion to compute separately for each term. 

Reduce the number of slack variables to 1. Calculating Q(x; 1, 1, . . . , 1) is equivalent to 
evaluating the limit as Zi goes to 1 for all i. Our first step is to reduce the number of 
slack variables to 1. This is done by finding a suitable integer vector A and making the 
substitution — >■ t Xi . In order to do so, A must be picked such that there is no zero 
denominator in any term, i.e., for every i and j we can not have both = 1 and the 
inner product (A, Bif) = 0. Barvinok showed such A can be picked in polynomial time 
by choosing points on the moment curve. Loera et al. [13] suggested to use random 
vectors to avoid large integer entries. 



Using Laurent series expansion. Now we need Q(x; t Xl , . . . , t Xn 

T I J-Al J-A n > 

±Ji \*L,V , . . . , V 



, where 

t=i 



Q(x;t*\...,t^) = ^ 



rj(i _ t&^Mij) ' 

An obvious way is to make the substitution t — 1 + s. Then we have 
Q = Q(x;(l + s) Al ,...,(l + s) A ' 1 ) =CTQ(x;(l + s) Al ,...,(l + S 



t=i 



\A„^ 



s=0 



where we are taking constant term of a Laurent series in s. The linearity of the constant 
term operator allows us to compute separately: 

Q(x;l,...,l)=y:CT i - fc(1 + s)AV - (1 + s)i " ) 



n(i-(i+s)<w^.) • 

The substitution t = 1 + s seems natural and works fine for the m = case, where 
we only need do polynomial division of a single variable. For m > 1, we find it better 
to make the exponential substitution t = e s , which leads to 



L{x~e XlS 6 AriS ) 
Q{X] !. • • • > 1) = iJ OT n(1 '_ U)/) • 

Proposition 2. Let L(x; z) be a Laurent polynomial, Mj monomials in x and bj inte- 
gers. The constant term 

CT L(x;e AlS ,...,e A " s ) 
s 11 -=i(l - e^M,) 

can be efficiently computed as a sum of at most (rw/21) s ^ m P^ e rational functions. 

Proof. By rearranging the denominators, we may assume that Mi = M 2 = ■ ■ ■ = M r = 
1 and all the other Mj are not 1. It is easy to see that 

sT .L(x; e XlS , . . . , e XnS ) 



nUCi-^'M,-; 
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is a power series in s, so that we are indeed taking coefficient of s r in a power series. 
To expand, we only need the following two formulas: 



V -—s n = -1 + -s - —s 2 + — s 4 —s 6 + - s 8 + ■ 

Z-^ nl 9 19 79n 3n9zm i9nci«nn 



1 - e s ^ n\ 2 12 720 30240 1209600 

n>0 



1-e'M (1 - M)(l - ^(e- - 1)) ^J(l-M) 
The B n are the well-known Bernulli numbers. The computation 

J ^ = I [ ~\ — s ™ = c ™ sn n ^S ner degree terms 

j=i e 3 j=l n>0 n=0 

is easy since this only involves univariate polynomial multiplication. Thus we can write 
r L(x; e XlS , . . . , e XnS ) 



nJ =1 (l-^*^i) n>0 n>0 i=r+l 

= E sn E l ^y ni n c nj{Mj)b™ j . 

n>0 rio+»^l+n r+ l^ yn d =n j=r+l 

It follows that the desired constant term is a sum of £ no (x)c' ni YYj= r +i ^{M^b™ . The 
number of terms we obtained is equal to which is the number of nonnegative 

integer solutions of n + ri\ + n r+ i + ■ ■ ■ + n& — r. 

The proposition follows since reaches maximum at r = \d/2~\. □ 

Using the exponential substitution, we only need two formulas for series expansion 
and we can store in advance those coefficients for n < d. This is much better than 
using the substitution t = 1 + s, especially for m > 1. In the m = 1 case, we can expect 
to obtain a single rational functions, since the computer performs well on univariate 
rational functions. 

The only problem for this approach is the large integer problem. It seems unavoid- 
able that some of the bij = (A, might be large. This results in huge numbers since 
our formula involves 6™ for n < d. This problem is avoidable by modular a reasonably 
large prime number (> d), provided that we know some information of the final output 
in advance. This is indeed the case since our problems are usually combinatorial and 
the final output are nice in some sense. The m = case problem usually computes the 
number of lattice points in a rational convex polytope. There are methods to estimate 
this number. For the m = 1 case we usually compute the Ehrhart series as in the next 
subsection. Such generating functions seems always have simple denominator and its 
numerator has not too large integer coefficients. Thus we can do the constant term 
extraction modular a chosen large prime. If necessary, we can do the computation 
several times using different large primes and then use the Chinese remainder theorem 
to construct the final output. 
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3.3 Direct computation of the Ehrhart series 

Given a rational polytope P = {a : Aa = b, a > 0} C M. n , the function 

i p (jfe) := #(kP n Z n ) = #{a G Z ra ) : = kb, a> 0} 

defined for positive integer k was first studied by E. Ehrhart [5] . It is call the Ehrhart 
polynomial when the vertices of P are integral and is call the Ehrhart quasi-polynomial 
for arbitrary rational polytopes [TTJ Ch. 4]. For us it is easier to describe using 
generating functions. The Ehrhart series of P defined by 

jh?) = X>(*)?* 

fc>0 

is an Elliott-rational function. It has close connection with Hilbert series for some 
graded algebra. 

An important problem is to compute the Ehrhart quasi-polynomial of given P. 
Earlier method is to compute ip(k) for sufficiently many k and then use the Lagrange 
interpolation formula to construct ip(k). We can compute the Ehrhart series directly 
by the following constant term representation. 

Ip(q) = CT n ^ (i - A?1J A?J - - - ^— ^ x - _ ^ — 

This corresponds to a rational cone or the homogeneous system (A,—b)(3 = 0. This 
leads to a combined way for Ehrhart series computation: Use LattE to do the rational 
cone decomposition and then use our way of eliminating the slack variables. There is 
no implementation for this approach yet. We remark that a similar idea was proposed 
in [13] to avoid the use of Brion's theorem. But they only wish to compute ip(k) for 
particular k, or equivalently, compute ip(l). 

Many benchmark problems are related to the computation of Ehrhart series. The 
counting of magic squares and its variations is one of the common topic in both combi- 
natorics and computational geometry. The definition of magic squares are different in 
different literature. Here an n by n nonnegative integer matrix M = {di t j) nxn is said to 
be a magic squares with magic sum m if its row sum, column sum, and two diagonal 
sum are all equal to m. That is, the order n magic square polytope MS n is defined by 
the following linear constraints: 

+ Qi,2 + • • • + a>i,n = 1, for 1 < % < n 
a i,j + a 2,j + ■ ■ ■ + a n ,j — 1, for 1 < j < n 

0-1,1 + «2,2 + • • • + CL n ,n = 1, 0>n,l + dn-l?. + " * * + CLl,n = 1- 

The determination of lMS n {<l) is known for n — 3, and for n = 4. Many algorithms 
meet trouble for the n = 5 case. See e.g., [2]. The first solution for the order 5 magic 
squares was reported in 
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Our approach is along the line of MacMahon partition analysis. Let A» index the 
i-th row equation, let \ij index the j-th column equation, and let v\ and z/ 2 index the 
two diagonal equations. Then it is not hard to see that 

EE II a? «9 m = CTF n (x,q;X,^u), 

m=0 M l<ij<n 

where the second sum ranges over all magic squares M with magic sum m, and 
F n (x, g; A, a*, „) = - _ ^ Ai ^ N)i/f+J=n+1) J _ q{Xi . . . . . . • 

In particular, setting a;^- = 1 gives the generating function for I MSn (q). 
I msM=CT J] — — (J-y-^— 



Though we believe that the order 6 magic squares problem should be computed fast by 
the proposed approach here, we would like to introduce a more elementary approach 
in the next section. 



4 A Euclid style algorithm for MacMahon partition 
analysis 

In the big picture of subsection 3.1, we mentioned that the second step may have 
different approaches. The proposed method is not easy to implement and is not ideal. 
In this section we provide a Euclid style algorithm. This algorithm is elementary, deal 
with the numerator uniformly and perform well in practice. 

The Euclid style algorithm is along the line of Xin's partial fraction algorithm, 
so it is time to explain briefly the field of iterated Laurent series. We use the list 
vars = [xi,X2, ■ ■ ■ ,x n ] to define the working field K = Q((x n ))((x n -i)) ■ ■ ■ ((xi)). See 
[T8] for detailed explanation. Here we only need the fact that every monomial M^l 
is comparable with 1 in K by the following rule: find the smallest variable Xj appear 
in M, so deg x . M = for all i < j. If deg x . M > then we say M is small, denoted 
M < 1, otherwise we say M is large, denoted M > 1. Thus we can determine which 
of the following two series expansion holds in K. 



1 



1 - M 



Mm , if M < I] 

y^-^T—T, if Af > 1. 



m>0 

1 V ^ 



-Mil- II M) ^ M m +! 
When expanding E as a series in K, we usually rewrite E in its proper form: 

E 



;i-M 1 )(l-M 2 )---(l-M n 
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where L is a Laurent polynomial and Mj < 1 for all i. Note that the proper form of E 
is not unique. For instance 1/(1 — x) = (1 + x)/(l — x 2 ) are both proper form. 

Algorithm CTEuclid for the second step in subsection 3.1. 

2-1 For each Elliott-rational functions as a summand, successively choose a variable, 
say x = X£, and take the constant term in x. Thus we reduce to taking constant 
term in a single variable. 

2-2 Reduce CT^. E as a sum of contributions of single denominator factors. 

2-3 The contribution of a single denominator factor can be recursively computed. 

Both last two steps contains some new features. 

4.1 A reduction to the contribution of a single factor 

To take constant term in x, we shall write 

Ljx) 

where L(x) is a Laurent polynomial, Ui are free of x and aj are positive integers. We 
assume the denominator factors \ —UiX ai are coprime to each other. This is guaranteed 
by the first step in subsection 3.1. 

The partial fraction decomposition of E in x should be 

E = P (x)+ ?4 + ±-^, (4) 

X K z — ' 1 — UiX at 
i=l 

where P(x),p(x), A^x) are all polynomials and &egp(x) < k, deg A^x) < di. If written 
in proper form, we shall have 

if mx* < 1; 



Aj{x) I 1 - Ui x a 

1 aiX — = ^^-r, if UiX a * > 1. 

-'/,.'•" 1 - 7—^- -Ui 1 



UiX l ' 

Since x~ ai Ai(x) contains only negative powers of x, we obtain 



CT£ = P(0)+ V ^(0), (5) 



UiX a i <1 



where the sum ranges over all i such that UiX ai is small. For this reason, the denom- 
inator factor 1 — Ujx" is said to be contributing if u,ix ai is small and is said to be not 
contributing if UiX ai is large. 
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It will be convenient to use the following notation. 

(E, 1 - u lX a % = CT 1 —-E(1 - mx^) = A(0). 

x 1 — UiX ai 

One can think that when taking constant term in x, only the single underlined denom- 
inator factor contributes. In this view it should be clear that (E, 1 — ux a \ x = if the 
denominator of E is coprime to 1 — ux a . Thus (jSJ) can be rewritten as 

CT E = P(0) + J" X (uiX ai < 1) (E, 1 - Ui x a %, (6) 

i 

where x(S) is 1 if the statement S is true and if S is false. 

Next we show that we need not compute P (0). Note that P(x) could be computed 
by polynomial division. But the polynomial division algorithm is very expensive in our 
situation because the number of the denominator factors and the number of variables 
may be large. Our first observation is that P (x) = if E is proper in x, i.e., the degree 
in the numerator is less than the degree in the denominator. To see that we can avoids 
computing P(0) in general, we need the following lemma. 

Lemma 3. Let E be as in fl3]). If E is a proper rational function, then 

n 

CT E = < 1)(E, 1 - u iX a %- (7) 

x i! — ' 

i=l 

If E\ x= o exists, then 

n 

CTE = E\ X=0 - > 1)(E,1- Ui x a %. flTf) 

X ' ' 

1=1 

Proof. The first equality is obvious. For the second equality, since E\ x= § exists, p(x) 
must be in 01]). Now setting x = gives 

n 

E\ x=0 = P(0) + MO) = CTE + J2 > 1) (E, 1 - u lX a %. 

i i=l 

The lemma then follows. □ 

Formula (j7[J is a kind of dual of ([7]). For this reason, we also call the denominator 
factor 1 — u.iX ai with u,ix ai > 1 dually contributing. If E is proper and has no pole at 
x = then both formulas ([7]) and (j?[J applies and we can choose the simpler one. 

Now we are ready to reduce the computation of CT X E to the contribution from a 
single denominator factor. Split L(x) as L\(x) +L 2 (x), where L\ contains only positive 
powers in x and L 2 contains only nonpositive powers in x. Now E is written as E\ + E 2 
where Ej, j = 1,2, are similar to E except that L(x) is replaced by Lj(x). Clearly 
Ei\ x =o = and E 2 is proper. It follows from Lemma [3] that 

CTE = J2x(uiX a > < l)(E 2 ,l-u l x a '\ x -J2x(n l x a ' > 1)(E 1 , 1 - Ui x a %. (8) 

i i 
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4.2 A recursion for the contribution of a single factor 

The contribution of a linear factor is easy: 

(E, 1 - ux\ x = CT — — — E(l - ux) = E(l - ux)\ x=1/u . 

x 1 — UX 

However, effective computation for nonlinear factor was a long standing problem. One 
can factor 1 — ux a into linear factors using roots of unities, but there is no simple way 
to get rid of the roots of unities in the final outcome. We present here a Euclid-style 
algorithm dealing with the nonlinear case, by repeated application of the following 
recursion. 

Proposition 4. Let E be as in (j3J) and let 1 — ux a be a denominator factor. Then we 
can find E' such that 

(E, 1 - ux a \ x = - J2(E', 1 - v lX %, 

i 

where < hi < a/2 for all i and the number of terms is at most n — 1. 

Proof. Without loss of generality, we may assume ux a = U\x ai . We need to compute 

IE, 1 - ux a L = CT „ = AAO). 

The following characterization of A\ (x) is well-known: 

= E(l -ux a ) (mod (1 -ux a )). 

This is simply obtained by multiplying both sides of (j4|) by (1 — ux a ) and then mod- 
ulo the ideal (1 — ux a ) generated by 1 — ux a . The A±(x) is the unique polynomial 
representation satisfying deg^i(x) < a. 

To compute Ai(x), we can change the terms by their simpler representations and 
finally take remainder when dividing by 1 — ux a . The following clearly holds, 

x m = u~ e x r (mod (1 - ux a )) if m = £a + r. 

Particularly, the remainder rem(x m , l—ux a , x) and the signed remainder srem(x m , 1 — 
ux a , x) of x m when dividing by 1 — ux a is defined to be 

rem(x m , 1 — ux a , x) = u~ l x r , where m = £a + r, < r < a. (9) 
srem(x m , 1 — ux a , x) = u~ l x r , where m = £a + r, —a/2 < r < a/2. (10) 

These definitions linearly extends for Laurent polynomials. 

The new idea is that A±(0) can be cracked out by using a better representative of 
Ai(x) + (1 — ux a ) instead of the explicit formula of A\(x). Clearly we have 

AAx) = t~ r- (mod (1 - ux a )). 

nr=2( 1 -^ srem(x a ',l-ux a ,x)) v x /; 
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This can be rewritten in the following form: 

±ML(x) . , H 

where M is a monomial and < hi < a/2 for all z. 
Now we have to split into two cases: 

i) if all the fo, are 0, then we immediately obtain 

Ai(x) = j^fj — ^ -y rem(±ML(x), 1 — ux a , x) 

and -Ai(O) can be easily computed. 

ii) if at least one of hi is great than 0, then rewrite 

Mix) = X f {X) (mod<l-^», 

where 

L'(x) = ± rem(x~ 1 ML(x), 1 - ux a ) 
is a polynomial in x of degree less than a. Now comes the crucial observation: 

A 1 (0) = CT. 1 XL ' {X) 



x 1 — UX a nr=2(l — ViX hi ) 

In our notation, this is just 

(E, 1 - ux a \ x = (E f , 1 - ux a \ x , (11) 

where 

h 



1 — MX" nf=2(l — ViX b% ) 

is a proper rational function with E'\ x=0 = 0. It follows by the partial fraction decom- 
position of E' and then setting x = that 

n 

(E 1 , 1 - ux a \ x = - J2( E '> 1 - v i x %- ( 12 ) 

8=2 

Note that the terms for 6^ = vanish. The proposition then follows. □ 

The Ell package in [18] computes ^i(O) by first finding A\{x) and then setting 
x = 0. This is not a good strategy when the number of variables or n is large. The 
explicit formula of Ai(x) may be very huge. To see this, consider the polynomial 
representative of 1 — vx b for b coprime to a. We have 

1 x- ab {x ab u b - 1 + 1 - x ab v a ) x~ ab {l - (vx b ) a ) x~ ab (l - (ux a ) b ) 



l-vx b {u b - v a )(l - vx b ) {u b - v a ){l - vx b ) {u b - v a ){l - vx b )' 
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Now the second term vanishes when modulo 1 — ux a , so we are left with 

i — ab a ~ 1 

*—r = -4 V (vx b ) j (mod (1 - ux a )). 

1 - vx b u b - v a ^ y > y \ u 

The gcd(a, b) > 1 case needs a bit more work. See [IS] for detailed information. 

Now take a = 3 as an example. If n = 21 and all the other 20 a, satisfying 
gcd(a,aj) = 1, then the formula for A\(x) will involve 20 three-term-factors (1 + 
UiX ai + u?x 2ai ). No miracles will happen if there are more than 3 variables, and A\(x) 
usually contains at least about 3 20 terms. While if we use Proposition HJ then bi = 1 
for all i and we only obtain 20 simple rational functions. 

Repeated application of Proposition @] will give a sum of simple rational functions. 
The number of terms only depends on and the process is similar to Euclid's gcd 
algorithm. Denote this number by f(i; a±, a 2 , ■ ■ ■ , a n ) where a corresponds to a\. Then 
f(i; ai, a,2, ■ ■ ■ , a n ) is recursively determined by the following rules: 

1. If Oj = for all j ^ i then /(i; a>i, a 2 , ■ ■ ■ , a n ) = 1; 

2. We have /(z; ai, a 2 , . . . , a n ) = /(i; &i, b 2 , . . . , b n ), where 6j = aj and 
6j = min( rem(aj, Oj), — rem(aj, a,)) for j 7^ z. 

3. If a>j < a,i/2 for all j ^ i, then 

/(z; 01, a 2 ,...,a n ) = /0'» °i» °2» • • • , 

If n = 1 then /(l; ai) = 1. If n = 2 we also have f(i; a±, a 2 ) = 1, because the sum 
of the recursion contains a single term. For n — 3, computation evidence suggests that 
/(l; ai, a 2 , a 3 ) is almost O(logai) 2 . For larger n, we raise the following problem: 

Let f(i\ a±, a 2 , ... , a n ) be defined as above. Prove or disprove that f(i; a±, a 2 , ... , a n ) 
is a polynomial in loga^. 

If the answer is positive, then we will obtain a simple polynomial algorithm at least 
for one variable elimination. But this might not be the right problem, since we have 
much freedom to apply the partial fraction technique, and the current approach is too 
elementary. 



5 The Maple package CTEuclid 

The algorithm in Section 4 is implemented by the Maple package CTEuclid, which can 
be downloaded from the following link 

https: / /www. dropbox.com/sh/scepodyyn4ff7ro/ffhqmeN7ne/MPA, 
where two demo files are provided to explain how to use the package. One file works 
on magic squares of order up to 5 and the other file works on the Sdd problem [9] of 
order up to 5. Both files contain the essential idea for attacking the order 6 case. Here 
we only report the Ehrhart series for magic squares of order 6. 
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CTEuclid is the first package designed for complicated or even benchmark problems. 
It also performs well for simple problems. The algorithm has three main steps with the 
second step split into 3 small steps, as described in Section 4.1. For the sake of clarity, 
we explain by several Knapsack type examples. 



5.1 Knapsack type examples 



Let do, cti, . . . , a n be positive integers with a = (ai, . . . , a n ), gcd(ai, . . . , a n ) = 1 and 
a i < a o fc> r all i, and let 

P = {x eR n : ax = a ,x > 0}. 

A basic problem is to determine if P contains an integer vector, or how many integer 
vectors does P contain. The former is called integer programming feasibility problem. 
See pQ for an introduction on this topic. Here we concentrate on the second problem, 
which is also called Knapsack type problems. Clearly we have 



#P = \x a °h n 7 r = CT - r - -. 

L 1 (1 - X ai ) • ■ • (1 - X a ") x x a °(l-x ai )---(l-x a ») 

Example 5. Compute the following constant term: 

CT \ 

x x 41 (l-x)(l-x 5 )(l-3; 14 )' 

Solution. We first add slack variables and get 

Qfji J^J _ Qfji _ Qfji 

x x 41 (l -xzi){l -x 5 z 2 )(l -x 14 z 3 ) ~ x x 41 (l -xzi)(l -x 5 z 2 )(l -x u 
where the three underlined factors are contributing. For the last factor we have 

I = CT — 

x X 41 {1 - XZx){l - X 5 Z 2 ) {1 - X U Z 3 ) x (1 -gZx)(l -X 5 2 2 ) (l ~X 14 Z 3 ) 

3 

_ _ Qrj ^3 

x (i - XZl )(i - x 5 z 2 ) (i - x u z 3 y 

where in our notation, only the first two factors are contributing. The flexibility of i 
algorithm allows us to obtain the following combined form: 

x x 41 (l - xzi)(l - x b z 2 ) {l - X 14 Z 3 ) x x 41 (l - - x b z 2 ) {l - X 14 . 

—41 S 

= ct - CT — - 

x (1 - x Zl ){l - x 5 z 2 ) {l - X l4 Z 3 ) x (1 -xz^jl -x 5 z 2 ) {l - x 14 z 3 ) 

—41 S 

x — xz 3 
(1 - XZi)(l - x 5 z 2 )(l - x u z 3 ) ' 



=° T (1- 
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Now the first contribution is simple: 

—41 3 41 —1 3 

Qrj x ~ xz 3 Z l ~ Z l Z 3 



[l-x Zl )(l - x 5 z 2 )(l - x u z 3 ) (1 - z^z 2 ){\ - z^ u z 3 ) 



The contribution of the second factor becomes 

—41 3 

CT 



x (1 — XZi)(l — x 5 z 2 )(l — x 1 z 3 z 2 ) 

,-y»5 ,y 1 ^,12 ,-yj 2 -y2 ,y 3 

Qrj x ^3 z 2 x ^3 Z 2 



z (1 — — £ 5 2 2 )(1 — XZ 3 zf) 



r 5 ^12^-1 ^2 ^2 ^3 

CT 



X Z<2 Z^ OC Z^ Z<2 



x (1 — XZi)(l — x 5 z 2 )(l — xz 3 1 z^ 



7 -5 J2 7 -l _ .-2 2 J r 4„-3 _ „4„- 

6\ ^2 ^3 ^1 ^3^2 ^3^2 ^3^2 



(1 - zfz 2 ){l - z^z^zl) (1 - z 3 zf Zl ){\ - z 5 3 Z2 U ) 

Thus we obtain a sum of three terms and come to step 3. We need to make a sub- 
stitution so that z^ 5 z 2 , z^ 14 z 3 , z^z^z^, z^z^ 14 are not equal to 1. One choice is 
z\ — 1, z 2 = t, z 3 = t. Then the constant term becomes 

1 - t 3 t u -t 5 t-t 1 - 1 3 t u - t 5 



(1 - t)(l - t) (1 - t)(l - t=) (l-t-a)(l-t-») (l-t)(l-t) 
Next we let t = 1 + s and take constant term in s separately to get 
CT l-(l + s) 3 CT (1 + s) n - (1 + s) 5 



S 2 a S 2(2 + s) 
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= [s 2 ](l - (1 + 3s + 3s 2 )) + [s 2 ](l + lis + 55s 2 - (1 + 5s + 10s 2 ))-(l - s/2 + s 2 /4)) 

45 6 

= -3 + [s](6 + 45s) (1/2 - s/4) = -3 + — - - = 18. 

□ 

Next we consider a relatively complicated example, which is Example 1 of [I]. 
Example 6. Show that the polytope P contains no integer lattice points, where 

P = {x el 3 : 12, 223xi + 12, 224x 2 + 36, 671x 3 = 149, 389, 505, x > 0}. 
Sketch of the Proof. The problem is equivalent to compute the following constant term: 
n r r \ 

x ^149389505 (1 _ ^12223^ M _ ^12224^ (J _ ^36671^ ' 

Our CTEuclid package will give a sum of 10 terms, which reduce by cancelation to a 
sum of 4 terms. By letting Z\ = 1, z 2 = t, z 3 = t we reach 

^12223 ^.24446 £-36670 £12223 
+ T. TTTTTttoo 7T - T. 7T + 



(t _ 1) (£12223 _ !) (£_!) (£12223 _ !) (£_ X ) (£24447 _ X ) (£ _ ^ (£24447 
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To eliminate the slack variable t = 1, we let t = e s and compute the constant term in 
s for each term separately. For instance, the first term becomes, 



CT 



s 




1) (e 12223s - 1) 




The four constant terms sum to 0. This completes the proof. 



□ 



Still from article [TJ, a very hard instance is the 4 dimensional polytope with 



Aardal and Lenstra can show that P contains no integer vectors in 0.01 seconds while 
the Branch and Bound method takes more than 8139 seconds. When dealing with 
this problem, CTEuclid gives 398 terms and returns in about 0.4 seconds. The 
advantage of our algorithm is that we can compute the number j^P for different a 
in about the same time. For instance, if = 89643481 x 1001, CTEuclid stile gives 
398 terms and returns 94267024658624993843 in about 0.4 seconds. It is worth noting 
that many of the 398 terms cancels with only 118 terms left. It might be interesting to 
understand how these terms cancels with each other. We also tried random examples 
with 100000 < <2j < 2500000, the performance is not nice when n > 5. 

The above examples show that even if the final answer is simple, the middle step 
may give complicated results. We make the following observation: Step 1 takes no 
time; Step 2 is the most important step, where we hope the number of terms nt we 
get is small; Step 3 of eliminating the slack variables is the most time consuming step, 
and its running time is almost linear to nt. This leads to the following two technical 
treatment when dealing with complicated or even benchmark problems. 

1. In step 2, we save some data for later use: the data for every 1000 terms we 
obtained are saved in different files, the data for all bad denominator factors are 
saved in a file. 

2. The running time for step 3 can be estimated according to the size of the data 
we saved in step 2. 

5.2 Computation for magic squares of order 6 

In subsection 13.31 we have convert the Ehrhart series for order n magic squares polytope 
to a constant term. Applying the package CTEuclid will give the desired Ehrhart series. 
There is no difficulty for the cases n = 3,4. Indeed the author's E112 package is faster 
in these two cases but meet memory problem for the n = 5 case. Our CTEuclid 
computes the n = 5 case in about 2700 seconds of cpu time. The n = 6 case is much 



a = 89643481, (a u ...,a s ) = (12223, 12224, 36674, 61119, 85569). 
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more complicated, and is not known before. The Ehrhart series for order 6 magic 
square has been put at Sloane's integer sequence website [15], A216039]. It looks like 



j / ) = (l-gfN 

MS6 (1 - ? 3 ) 5 (1 - g 4 ) 5 (1 - 5 5 ) 4 (1 - q'f (1 - q 1 ? (1 - Q 8 ) 2 (1 - q 9 ) (1 - q 10 ) 

= 1 + 9Qq + 14763g 2 + 957936g 3 + 33177456g 4 + 718506720g 5 + • • • 

where 

N = q 138 + 99 q 137 + 15057 g 136 + • • • 

+ 21382798694422310755770332936g 69 + • ■ ■ + 15057g 2 + 99g + 1. 

This result is obtained by modulo three large primes. The total cpu time is about 70 x 3 
days. The author would like to thank his officemates for running these computations 
on their computers. 

For the order 6 magic squares problem, we have to save data for the results obtained 
in the second step. We split the data into different files, with each file containing 1000 
terms. The most time consuming step is the third step, especially when the dimension 
is large. The good news is that we can estimate the running time. In the third step 
we need to eliminate the slack variables. When the data files are too large, we must 
do the computation modulo a large prime. Here we choose p\ = 636, 286, 597 for our 
first computation. Our estimate time for the third step is about 108 days of cpu time 
based on the observation that each file takes about 5 minutes. 

The flexibility of our algorithm allows us to reduce the number of terms obtained 
in the second step. The idea is that we can delay the adding of slack variables. That 
is, we can directly eliminate several variables as long as all terms in the outcome are 
valid, i.e., with no zero in the denominator. This is done by several tries and we 
need to control the size of the output. Next we apply our package to each term in 
the output. In this way, we reduced the size of the data files by one third and now 
the running time for the third step is about 70 days. These data can be reused for 
different modulo computations. Indeed we also did the computation modulo p 2 = 
460, 710, 223, 302, 903, 961 and p 3 = 1, 073, 129, 417, 747, 493, 923. 

Finally we use the Chinese remainder theorem to reconstruct the generating func- 
tion N/D. We conclude that this is the desired solution because the maximum coeffi- 
cient in N is about 6.797227759 x 10~ 17 pip 2 p3- Of course, if one needs a rigorous proof, 
one needs a bound for N\ q=1 , which should not be a hard problem. 



5.3 Possible improvements 

In summary, Step 1 is simple but necessary for general problems, but we shall con- 
sider delay this step in practical problems. Step 3 is of independent interest. Give a 
faster algorithm is possible but we will not discuss here. Step 2 is the crucial step, 
and the number of terms obtained in this step is dominant. We shall concentrate on 
improvements to this step. 
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Let us consider the linear Diophantine system Aa = b with augmented matrix 
(A,b). Clearly elementary row operation will not change the solution space. So it is 
possible to find a simpler matrix (A', b') with the same solution space. This step may be 
achieved by the well-known Lenstra Lenstra Lovasz's (LLL) basis reduction algorithm 
[T0| [H] . The author is considering upgrade the package by using this idea. 

There are many ideas to improve the algorithm. An interaction with known theories 
will give hints for improvements. For instance, Stanley's monster reciprocity theory 
contains some algorithmic ideas. See [T6l [19]. Our ultimate goal is to develop a 
classic algorithm in this subject. We believe that such an algorithm should contain the 
following features. 

1. We shall deal with the inhomogeneous case directly, avoid using Brion's theorem, 
which is too expensive when the number of vertices is large. 

2. We shall give a decomposition dealing with Laurent polynomial numerator in a 
uniform way. The outcome will be analogous of simplicial cones. 

3. We shall apply Barvinok's decomposition of simplicial cones into unimodular sim- 
plicial cones or the like, which we believe to be the genuine essence of Barvinok's 
algorithm. 

Acknowledgements: This work was supported by the Natural Science Foundation of China 
(11171231). 
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